翻訳と辞書
Words near each other
・ Flairck
・ Flairjet
・ Flaite
・ Flaith
・ Flaithbertach
・ Flaithbertach mac Inmainén
・ Flaithbertach mac Loingsig
・ Flaithbertach Ua Néill
・ Flaithbertaigh Ua Flaithbertaigh
・ Flaithem Mac Mael Gaimrid
・ Flaithnia mac Flainn
・ Flaithniadh mac Congal
・ Flaithrí mac Domnaill
・ Flaiz Adventist College
・ Flajelata
Flajolet–Martin algorithm
・ FlaK
・ Flak 'n' Flight
・ Flak (disambiguation)
・ Flak (Transformers)
・ Flak Attack
・ Flak Bait
・ Flak corps
・ Flak jacket
・ Flak Magazine
・ Flak tower
・ Flak-Kaserne Ludwigsburg
・ Flaka e Janarit
・ Flaka Krelani
・ Flaka Surroi


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Flajolet–Martin algorithm : ウィキペディア英語版
Flajolet–Martin algorithm

The Flajolet–Martin algorithm is an algorithm for approximating the number of distinct elements in a stream with a single pass and space-consumption which is logarithmic in the maximum number of possible distinct elements in the stream. The algorithm was introduced by Philippe Flajolet and G. Nigel Martin in their 1984 paper "Probabilistic Counting Algorithms for Data Base Applications". Later it has been refined in the papers "LogLog counting of large cardinalities" by Marianne Durand and Philippe Flajolet, and "HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm" by Philippe Flajolet et al.
In their 2010 paper "An optimal algorithm for the distinct elements problem", Daniel M. Kane, Jelani Nelson and David P. Woodruff gives an improved algorithm which uses nearly optimal space, and has optimal ''O''(1) update and reporting times.
==The algorithm==
Assume that we are given a hash function hash(x) which maps input x to integers in the range (2^L - 1 ) and where the outputs are sufficiently uniformly distributed. Note that the set of integers from 0 to 2^L-1 corresponds to the set of binary strings of length L. For any non-negative integer y, define bit(y,k) to be the k-th bit in the binary representation of y, such that:

y = \sum_ \text(y,k) 2^k

We then define a function \rho(y) which outputs the position of the least significant 1-bit in the binary representation of y:
\rho(y) = \min_ \text(y,k) \neq 0
where \rho(0) = L. Note that with the above definition we are using 0-indexing for the positions. For example, \rho(13) = \rho(1101) = 0 since the least significant bit is a 1, and \rho(8) = \rho(0100) = 2 since the least significant 1-bit is at the third position. At this point, note that under the assumption that the output of our hash-function is uniformly distributed, then the probability of observing a hash-output ending with 2^k (a one, followed by k zeroes) is 2^ since this corresponds to flipping k heads and then a tail with a fair coin.
Now the Flajolet–Martin algorithm for estimating the cardinality of a multiset M is as follows:
# Initialize a bit-vector BITMAP to be of length L and contain all 0's.
# For each element x in M:
## index = \rho(\text(x)).
## BITMAP() = 1.
# Let R denote the largest index i such that BITMAP() = 0.
# Estimate the cardinality of M as 2^R / \phi where \phi \approx 0.77351.
The idea is that if n is the number of distinct elements in the multiset M, then BITMAP() is accessed approximately n/2 times, BITMAP() is accessed approximately n/4 times and so on. Consequently if i \gg \log_2 n then BITMAP() is almost certainly 0, and if i \ll \log_2 n then BITMAP() is almost certainly 1. If i \approx \log_2 n then BITMAP() can be expected to be either 1 or 0.
The correction factor \phi \approx 0.77351 is found by calculations which can be found in the original paper.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Flajolet–Martin algorithm」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.